D² Absolute Error Score (d2_absolute_error_score)#
d2_absolute_error_score is a baseline-relative regression score based on absolute error (L1). It is the L1 analogue of r2_score:
R²compares your squared error to the mean baseline (best constant under L2).D²_absolutecompares your absolute error to the median baseline (best constant under L1).
Best possible score is 1.0.
1.0→ perfect predictions.0.0→ no better than always predicting the (weighted) median ofy_true.< 0→ worse than the median baseline (can be arbitrarily negative).
Learning goals#
Write the metric in math notation and interpret its values.
See why the median is the baseline for absolute error.
Implement
d2_absolute_error_scorefrom scratch in NumPy (including sample weights).Use the metric while optimizing a simple L1 linear regression model.
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.metrics import d2_absolute_error_score as sk_d2_absolute_error_score
from sklearn.metrics import mean_absolute_error as sk_mean_absolute_error
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
1) Definition (L1 analogue of R²)#
For targets (y_1,\dots,y_n) and predictions (\hat{y}_1,\dots,\hat{y}_n), define the mean absolute error (MAE):
[ \mathrm{MAE}(y, \hat{y}) = \frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i| ]
Let (m) be the empirical median of (y) (or the weighted median if sample_weight is provided). The median baseline is the constant predictor:
[ \tilde{y}_i = m\quad\text{for all } i ]
Then the D² absolute error score is
[ D^2_{\mathrm{AE}}(y, \hat{y}) = 1 - \frac{\mathrm{MAE}(y, \hat{y})}{\mathrm{MAE}(y, \tilde{y})}. ]
This is exactly what scikit-learn implements:
d2_absolute_error_score(y_true, y_pred)is the special case of
d2_pinball_score(..., alpha=0.5).
Multi-output#
If (y\in\mathbb{R}^{n\times m}), scikit-learn computes a score per output and then averages them (uniformly by default).
Important edge case (constant targets)#
If all (y_i) are equal, then (\mathrm{MAE}(y, \tilde{y}) = 0).
perfect predictions → score
1.0otherwise → scikit-learn returns
0.0(because the baseline has zero error).
# A tiny example
y_true = np.array([3.0, -0.5, 2.0, 7.0])
y_pred = np.array([2.5, 0.0, 2.0, 8.0])
m = np.median(y_true)
mae_model = np.mean(np.abs(y_true - y_pred))
mae_baseline = np.mean(np.abs(y_true - m))
print("median m =", m)
print("MAE(model) =", mae_model)
print("MAE(baseline) =", mae_baseline)
print("D²_AE manual =", 1 - mae_model / mae_baseline)
print("D²_AE sklearn =", sk_d2_absolute_error_score(y_true, y_pred))
median m = 2.5
MAE(model) = 0.5
MAE(baseline) = 2.125
D²_AE manual = 0.7647058823529411
D²_AE sklearn = 0.7647058823529411
2) Why the baseline is the median#
For an L2 loss, the best constant prediction is the mean.
For an L1 loss, the best constant prediction is the median.
Consider the optimization problem over constants (c):
[ \min_c; f(c) = \sum_{i=1}^n |y_i - c|. ]
(f(c)) is convex and piecewise-linear.
A (sub)gradient of (|y_i - c|) w.r.t. (c) is (\mathrm{sign}(c - y_i)) (with any value in ([-1,1]) allowed when (c=y_i)).
So a subgradient of (f) is:
[ \partial f(c) = \sum_{i=1}^n \mathrm{sign}(c - y_i). ]
At an optimum (c^), we need (0 \in \partial f(c^)), which happens when roughly half the mass is on each side:
(#{i: y_i \le c^*} \ge n/2)
(#{i: y_i \ge c^*} \ge n/2)
That’s exactly the defining property of a median.
# Visual: MAE of a constant predictor c is minimized at the median
y = rng.normal(loc=0.0, scale=1.0, size=101)
c_grid = np.linspace(y.min() - 1.0, y.max() + 1.0, 400)
mae_c = np.array([np.mean(np.abs(y - c)) for c in c_grid])
m = np.median(y)
fig = go.Figure()
fig.add_trace(go.Scatter(x=c_grid, y=mae_c, mode="lines", name="MAE(y, c)"))
fig.add_vline(x=float(m), line_dash="dash", line_color="#E45756", annotation_text="median")
fig.update_layout(
title="For L1, the best constant predictor is the median",
xaxis_title="constant prediction c",
yaxis_title="MAE(y, c)",
height=420,
)
fig.show()
print("median:", m)
print("argmin grid approx:", c_grid[np.argmin(mae_c)])
median: -0.016801157504288795
argmin grid approx: -0.01921790647673083
3) Interpreting values: 1, 0, negative#
Because D² is a ratio to the median baseline:
1.0means (\mathrm{MAE}(y,\hat{y}) = 0).0.0means (\mathrm{MAE}(y,\hat{y}) = \mathrm{MAE}(y,\tilde{y})) (you match the baseline).-1.0means your MAE is twice the baseline MAE.More generally, if (\mathrm{MAE}(y,\hat{y}) = k,\mathrm{MAE}(y,\tilde{y})) then (D^2 = 1-k).
Below we build a few predictors on the same y_true and compare them.
y_true = 1.5 + rng.normal(size=80)
m = np.median(y_true)
baseline = np.full_like(y_true, m)
models = {
"Perfect": y_true,
"Noisy": y_true + rng.normal(0, 0.4, size=y_true.shape),
"Median baseline": baseline,
"Anti-median": 2 * m - y_true, # doubles every |y-m|, so D² = 1 - 2 = -1
}
rows = []
for name, y_pred in models.items():
mae_model = np.mean(np.abs(y_true - y_pred))
mae_base = np.mean(np.abs(y_true - baseline))
d2 = sk_d2_absolute_error_score(y_true, y_pred)
rows.append((name, mae_model, mae_base, d2))
rows
[('Perfect', 0.0, 0.7787010392304238, 1.0),
('Noisy', 0.33910291099295586, 0.7787010392304238, 0.5645274708659884),
('Median baseline', 0.7787010392304238, 0.7787010392304238, 0.0),
('Anti-median', 1.5574020784608475, 0.7787010392304238, -1.0)]
# Visual: MAE(model) vs MAE(baseline) and the resulting D²
names = [r[0] for r in rows]
mae_model = np.array([r[1] for r in rows])
mae_base = np.array([r[2] for r in rows])
d2 = np.array([r[3] for r in rows])
fig = make_subplots(rows=1, cols=2, subplot_titles=("Absolute errors", "D²_AE score"))
fig.add_trace(go.Bar(x=names, y=mae_model, name="MAE(model)", marker_color="#4C78A8"), row=1, col=1)
fig.add_trace(go.Bar(x=names, y=mae_base, name="MAE(median baseline)", marker_color="#F58518"), row=1, col=1)
fig.update_yaxes(title_text="MAE", row=1, col=1)
fig.add_trace(go.Bar(x=names, y=d2, name="D²_AE", marker_color="#54A24B"), row=1, col=2)
fig.add_hline(y=0, line_dash="dash", line_color="gray", row=1, col=2)
fig.update_yaxes(title_text="score", row=1, col=2)
fig.update_layout(barmode="group", height=420)
fig.show()
4) From-scratch NumPy implementation#
The implementation below mirrors scikit-learn’s behavior:
supports 1D and multi-output targets
supports
sample_weightsupports
multioutputaggregation:'raw_values'(per-output)'uniform_average'(default)array-like weights (length =
n_outputs)
We also reproduce scikit-learn’s convention for the constant-target edge case.
def _to_2d(y):
y = np.asarray(y)
if y.ndim == 1:
return y.reshape(-1, 1)
if y.ndim == 2:
return y
raise ValueError(f"y must be 1D or 2D, got shape {y.shape}")
def weighted_percentile_lower(array, sample_weight, percentile=50.0):
'''Lower weighted percentile (matches sklearn.utils.stats._weighted_percentile).
Parameters
----------
array : (n,) or (n, m)
sample_weight : (n,) or same shape as array
percentile : float in [0, 100]
Returns
-------
q : float or (m,) ndarray
'''
array = np.asarray(array)
w = np.asarray(sample_weight)
n_dim = array.ndim
if n_dim == 0:
return array.item()
if array.ndim == 1:
array = array.reshape(-1, 1)
if w.ndim == 1:
if w.shape[0] != array.shape[0]:
raise ValueError("sample_weight must have shape (n_samples,)")
w = np.tile(w.reshape(-1, 1), (1, array.shape[1]))
elif w.shape != array.shape:
raise ValueError("sample_weight must have shape (n_samples,) or array.shape")
if np.any(w < 0):
raise ValueError("sample_weight must be non-negative")
sorted_idx = np.argsort(array, axis=0)
sorted_array = np.take_along_axis(array, sorted_idx, axis=0)
sorted_w = np.take_along_axis(w, sorted_idx, axis=0).astype(float)
cdf = np.cumsum(sorted_w, axis=0)
total = cdf[-1]
adj = (percentile / 100.0) * total
adj = np.asarray(adj, dtype=float)
# percentile=0: ignore leading zeros (sklearn GH20528 behavior)
mask = adj == 0
if np.any(mask):
adj[mask] = np.nextafter(adj[mask], adj[mask] + 1)
idx = np.array([np.searchsorted(cdf[:, j], adj[j]) for j in range(array.shape[1])])
idx = np.clip(idx, 0, array.shape[0] - 1)
q = sorted_array[idx, np.arange(array.shape[1])]
return q[0] if n_dim == 1 else q
def mae_raw_values(y_true, y_pred, sample_weight=None):
'''Per-output MAE (returns shape (n_outputs,)).'''
y_true = _to_2d(y_true).astype(float)
y_pred = _to_2d(y_pred).astype(float)
if y_true.shape != y_pred.shape:
raise ValueError(f"Shape mismatch: y_true {y_true.shape}, y_pred {y_pred.shape}")
abs_err = np.abs(y_true - y_pred)
if sample_weight is None:
return abs_err.mean(axis=0)
w = np.asarray(sample_weight, dtype=float).reshape(-1)
if w.shape[0] != y_true.shape[0]:
raise ValueError("sample_weight must have shape (n_samples,)")
if np.any(w < 0):
raise ValueError("sample_weight must be non-negative")
w_sum = w.sum()
if w_sum == 0:
raise ValueError("sample_weight sum must be > 0")
return (abs_err * w[:, None]).sum(axis=0) / w_sum
def d2_absolute_error_score_numpy(
y_true,
y_pred,
*,
sample_weight=None,
multioutput="uniform_average",
):
'''NumPy implementation of sklearn.metrics.d2_absolute_error_score.'''
y_true_2d = _to_2d(y_true)
y_pred_2d = _to_2d(y_pred)
if y_true_2d.shape != y_pred_2d.shape:
raise ValueError(f"Shape mismatch: y_true {y_true_2d.shape}, y_pred {y_pred_2d.shape}")
n = y_true_2d.shape[0]
m = y_true_2d.shape[1]
if n < 2:
return float("nan")
num = mae_raw_values(y_true_2d, y_pred_2d, sample_weight=sample_weight)
if sample_weight is None:
med = np.percentile(y_true_2d, q=50, axis=0)
else:
med = weighted_percentile_lower(y_true_2d, sample_weight=sample_weight, percentile=50)
den = mae_raw_values(y_true_2d, np.tile(med, (n, 1)), sample_weight=sample_weight)
scores = np.ones(m, dtype=float)
nonzero_num = num != 0
nonzero_den = den != 0
valid = nonzero_num & nonzero_den
scores[valid] = 1.0 - (num[valid] / den[valid])
scores[nonzero_num & ~nonzero_den] = 0.0
if isinstance(multioutput, str):
if multioutput == "raw_values":
return scores
if multioutput != "uniform_average":
raise ValueError("multioutput must be 'raw_values', 'uniform_average', or array-like")
weights = None
else:
weights = np.asarray(multioutput, dtype=float)
if weights.shape != (m,):
raise ValueError(f"multioutput weights must have shape ({m},)")
return float(np.average(scores, weights=weights))
# Quick checks vs scikit-learn
# 1D
n = 120
y_true = rng.normal(size=n)
y_pred = y_true + rng.normal(0, 0.5, size=n)
print("1D")
print("numpy :", d2_absolute_error_score_numpy(y_true, y_pred))
print("sklearn:", sk_d2_absolute_error_score(y_true, y_pred))
# Weighted
w = rng.uniform(0.2, 2.0, size=n)
print()
print("Weighted")
print("numpy :", d2_absolute_error_score_numpy(y_true, y_pred, sample_weight=w))
print("sklearn:", sk_d2_absolute_error_score(y_true, y_pred, sample_weight=w))
# Multioutput
Y_true = rng.normal(size=(80, 3))
Y_pred = Y_true + rng.normal(0, 0.8, size=(80, 3))
print()
print("Multioutput (raw_values)")
print("numpy :", d2_absolute_error_score_numpy(Y_true, Y_pred, multioutput="raw_values"))
print("sklearn:", sk_d2_absolute_error_score(Y_true, Y_pred, multioutput="raw_values"))
print()
print("Multioutput (weighted average)")
weights = np.array([0.2, 0.3, 0.5])
print("numpy :", d2_absolute_error_score_numpy(Y_true, Y_pred, multioutput=weights))
print("sklearn:", sk_d2_absolute_error_score(Y_true, Y_pred, multioutput=weights))
1D
numpy : 0.4731975975371019
sklearn: 0.4731975975371019
Weighted
numpy : 0.4452618667501379
sklearn: 0.4452618667501379
Multioutput (raw_values)
numpy : [0.3068 0.2318 0.2531]
sklearn: [0.3068 0.2318 0.2531]
Multioutput (weighted average)
numpy : 0.2574517183454593
sklearn: 0.2574517183454593
5) Using D² while optimizing a model (from scratch)#
On a fixed dataset ({(x_i,y_i)}_{i=1}^n), the baseline term
[ \mathrm{MAE}(y, \tilde{y}) = \mathrm{MAE}\bigl(y,; \text{median}(y)\bigr) ]
depends only on (y), not on the model parameters (\theta). Therefore:
[ D^2_{\mathrm{AE}}(\theta) = 1 - \frac{\mathrm{MAE}(\theta)}{\mathrm{MAE}_\text{baseline}} ]
and maximizing (D^2_{\mathrm{AE}}) is equivalent to minimizing MAE.
L1 linear regression#
With a linear model (\hat{y} = Xw), the MAE objective is:
[ L(w) = \frac{1}{n}\sum_{i=1}^n |(Xw)_i - y_i|. ]
A subgradient is
[ \nabla_w L(w) \in \frac{1}{n}X^\top s,\quad s_i \in \mathrm{sign}((Xw)_i - y_i), ]
where (\mathrm{sign}(0)) can be any value in ([-1,1]). In code we use np.sign, which returns 0 at exactly 0 residual.
Below we fit an L1 regression model with subgradient descent and track both MAE and (D^2_{\mathrm{AE}}).
# Data with outliers: OLS (L2) can be pulled around; L1 is more robust.
n = 250
x = rng.uniform(-3, 3, size=n)
y = 2.0 + 1.5 * x + rng.normal(0, 1.0, size=n)
# Add a handful of strong outliers
out_idx = rng.choice(n, size=18, replace=False)
y[out_idx] += rng.normal(12, 4, size=out_idx.shape[0])
X = np.column_stack([np.ones(n), x])
# Baseline
m = np.median(y)
y_baseline = np.full_like(y, m)
mae_baseline = np.mean(np.abs(y - y_baseline))
# OLS (minimizes MSE, not MAE)
w_ols, *_ = np.linalg.lstsq(X, y, rcond=None)
y_hat_ols = X @ w_ols
print("MAE baseline:", mae_baseline)
print("D²_AE (OLS):", sk_d2_absolute_error_score(y, y_hat_ols))
MAE baseline: 3.1399640289860704
D²_AE (OLS): 0.3731633735620893
def fit_l1_linear_subgradient(X, y, *, lr0=0.4, n_steps=600):
'''Minimize MAE(y, Xw) with subgradient descent.'''
X = np.asarray(X, dtype=float)
y = np.asarray(y, dtype=float).reshape(-1)
n, d = X.shape
w = np.zeros(d)
mae_hist = []
d2_hist = []
# Constant baseline for D²
m = np.median(y)
mae_base = np.mean(np.abs(y - m))
for t in range(n_steps):
y_hat = X @ w
r = y_hat - y
# Subgradient of MAE
g = (X.T @ np.sign(r)) / n
lr = lr0 / np.sqrt(t + 1) # simple diminishing step size
w = w - lr * g
mae = np.mean(np.abs(r))
d2 = 1.0 if mae == 0 else 1.0 - mae / mae_base
mae_hist.append(mae)
d2_hist.append(d2)
return w, np.array(mae_hist), np.array(d2_hist)
w_l1, mae_hist, d2_hist = fit_l1_linear_subgradient(X, y)
y_hat_l1 = X @ w_l1
print("w_ols:", w_ols)
print("w_l1 :", w_l1)
print("D²_AE (L1):", sk_d2_absolute_error_score(y, y_hat_l1))
w_ols: [2.8653 1.5073]
w_l1 : [2.0334 1.5742]
D²_AE (L1): 0.44939361486933627
# Optimization diagnostics
iters = np.arange(len(mae_hist))
fig = make_subplots(rows=1, cols=2, subplot_titles=("D²_AE vs iteration", "MAE vs iteration"))
fig.add_trace(go.Scatter(x=iters, y=d2_hist, mode="lines", name="D²_AE"), row=1, col=1)
fig.add_hline(y=0, line_dash="dash", line_color="gray", row=1, col=1)
fig.update_xaxes(title_text="iteration", row=1, col=1)
fig.update_yaxes(title_text="score", row=1, col=1)
fig.add_trace(go.Scatter(x=iters, y=mae_hist, mode="lines", name="MAE"), row=1, col=2)
fig.update_xaxes(title_text="iteration", row=1, col=2)
fig.update_yaxes(title_text="MAE", row=1, col=2)
fig.update_layout(height=420)
fig.show()
# Fit visualization
x_line = np.linspace(x.min(), x.max(), 200)
X_line = np.column_stack([np.ones_like(x_line), x_line])
y_line_ols = X_line @ w_ols
y_line_l1 = X_line @ w_l1
y_line_baseline = np.full_like(x_line, m)
colors = np.where(np.isin(np.arange(n), out_idx), "#E45756", "#4C78A8")
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x,
y=y,
mode="markers",
name="data",
marker=dict(size=6, opacity=0.65, color=colors),
)
)
fig.add_trace(go.Scatter(x=x_line, y=y_line_baseline, mode="lines", name="median baseline", line=dict(dash="dot")))
fig.add_trace(go.Scatter(x=x_line, y=y_line_ols, mode="lines", name="OLS (L2)", line=dict(width=3)))
fig.add_trace(go.Scatter(x=x_line, y=y_line_l1, mode="lines", name="L1 via MAE", line=dict(width=3)))
fig.update_layout(
title="L1 (MAE) fit is less sensitive to outliers (red points)",
xaxis_title="x",
yaxis_title="y",
height=520,
)
fig.show()
6) Pros, cons, and when to use#
Pros#
Baseline-relative and interpretable:
0= “no better than predicting the median”.Robust to outliers (compared to L2-based scores): absolute error grows linearly.
Scale- and shift-invariant: if you scale/shift both
y_trueandy_pred, the score stays the same.Connects to quantile regression: it is
d2_pinball_score(alpha=0.5).
Cons / pitfalls#
Non-smooth objective: optimizing MAE (and thus D²) needs subgradients / specialized solvers (or smooth approximations like Huber).
Can be negative and unbounded below: easy to misread if you expect a ([0,1]) score.
Not meaningful for a single sample: returns
NaNforn_samples < 2.Constant targets edge case: if
y_trueis constant, the baseline error is 0; scikit-learn returns1for perfect predictions, otherwise0.
Good use cases#
You care about typical error magnitude (median-like behavior), not rare extreme misses.
Data has heavy tails / outliers and you want a robustness-oriented score.
Model selection where a baseline-relative, unitless score is easier to compare than raw MAE.
Exercises#
Create a dataset where OLS has higher
R²but lowerD²_AEthan an L1 fit. Explain why.Show numerically that
d2_absolute_error_scoreis identical tod2_pinball_score(alpha=0.5).Implement a Huber regression from scratch and compare its
D²_AEto OLS and L1.With sample weights, build an example where the weighted median baseline shifts drastically and changes the score.
References#
scikit-learn API: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.d2_absolute_error_score.html
scikit-learn user guide (D²): https://scikit-learn.org/stable/modules/model_evaluation.html#d2-score
Koenker & Machado (1999), quantile regression goodness-of-fit: https://doi.org/10.1080/01621459.1999.10473882